04. TD Control: Sarsa
TD Control: Sarsa
Monte Carlo (MC) control methods require us to complete an entire episode of interaction before updating the Q-table. Temporal Difference (TD) methods will instead update the Q-table after every time step.
## Video
TD Control Sarsa Part 1
Watch the next video to learn about Sarsa (or Sarsa(0) ), one method for TD control.
## Video
TD Control Sarsa Part 2
## Pseudocode

In the algorithm, the number of episodes the agent collects is equal to num_episodes . For every time step t\geq 0 , the agent:
- takes the action A_t (from the current state S_t ) that is \epsilon -greedy with respect to the Q-table,
- receives the reward R_{t+1} and next state S_{t+1} ,
- chooses the next action A_{t+1} (from the next state S_{t+1} ) that is \epsilon -greedy with respect to the Q-table,
- uses the information in the tuple ( S_t , A_t , R_{t+1} , S_{t+1} , A_{t+1} ) to update the entry Q(S_t, A_t) in the Q-table corresponding to the current state S_t and the action A_t .